244        Bioinformatics

bedtools getfasta \

-fi ref/hg19.fa \

-bed motifs/chip3_peaks.bed \

-fo motifs/chip3_peaks.fasta

Those three FASTA files contain the sequences of enriched peaks for each sample and we

will use them as inputs for the motif detection programs.

There are two approaches for motif detection: de novo method when no prior informa-

tion is assumed and a position weight matrix (PWM) method for known motif.

The de novo approach searches for motifs in an input FASTA sequences without prior

information about the motifs. The search is conducted in a window around the peak. The

motif discovery programs either create k-mers from the sequences and perform exhaustive

search to identify the most frequent consensus substring of the sequences as motifs or use

sequence alignments iteratively to create consensus motifs from the PWM that identifies

motifs as the consensus motifs with the most frequent nucleobases. An example of de novo

motif discovery program is MEME Suite [11], which has DREME, MEME, or STREME

programs for discovering ungapped motifs. DREME is k-mer based, but it is depreciated

and will not be supported in the future. MEME is an alignment-based motif discovery

tool but it is recommended for motifs discovery in less than 50 sequences. STREME is a

k-mer based and it is recommended for detecting motifs in a dataset with more than 50

sequences. MEME SUITE is available as web server and command-line programs. To use

the web server or to download and install MEME SUITE, visit “https://meme-suite.org/

meme/”. On Linux, you can download and install MEME SUITE by using the following

steps:

wget https://meme-suite.org/meme/meme-software/5.4.1/meme-

5.4.1.tar.gz

tar vxf meme-5.4.1.tar.gz

cd meme-5.4.1

./configure --prefix=$HOME/meme --enable-build-libxml2

--enable-build-libxslt

make

make test

make install

Once you have installed it, you can add the following to “.bashrc” file:

export PATH=$HOME/meme/bin:$HOME/meme/libexec/meme-5.4.1:$PATH

The version may change so the best way is to visit the MEME SUITE website for the lat-

est installation instruction.

After adding the above line to the “.bashrc” file, you may need to restart the terminal or

use “source ~/.bashrc” for the change that you have made to take effect.

The MEME Suite programs require the ChIP-Seq dataset in FASTA (primary data-

set) and control dataset (secondary dataset). If no control dataset is used, MEME Suite